Skip to content

POC: Encryption read support for REST catalog#3221

Draft
smaheshwar-pltr wants to merge 2 commits intoapache:mainfrom
smaheshwar-pltr:sm/rest-encryption-read
Draft

POC: Encryption read support for REST catalog#3221
smaheshwar-pltr wants to merge 2 commits intoapache:mainfrom
smaheshwar-pltr:sm/rest-encryption-read

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown
Contributor

@smaheshwar-pltr smaheshwar-pltr commented Apr 7, 2026

Requires

Proof-of-concept:

  • Built required JARs locally from the Iceberg PR above
  • Built PyArrow locally from the arrow PR above
  • Integration test of Spark writing an encrypted table that's then read by PyIceberg passes 🎉:
@pytest.mark.integration
def test_read_encrypted_table_via_spark(session_catalog: Catalog) -> None:
    table_name = "default.test_encrypted_spark_read"

    # Configure KMS via py-kms-impl property with the same master keys as Java's UnitestKMS
    session_catalog.properties["py-kms-impl"] = "pyiceberg.encryption.kms.InMemoryKms"
    session_catalog.properties["encryption.kms.key.keyA"] = b"0123456789012345".hex()
    session_catalog.properties["encryption.kms.key.keyB"] = b"1123456789012345".hex()

    tbl = session_catalog.load_table(table_name)

    # Verify the table has encryption metadata
    assert tbl.metadata.properties.get("encryption.key-id") == "keyA"
    assert len(tbl.metadata.encryption_keys) > 0, "Expected encryption keys in table metadata"

    if tbl.metadata.current_snapshot_id is not None:
        snapshot = tbl.metadata.snapshot_by_id(tbl.metadata.current_snapshot_id)
        assert snapshot is not None
        assert snapshot.key_id is not None, "Expected key_id on snapshot"

    # Read the encrypted data via PyIceberg
    result = tbl.scan().to_arrow()

    # Verify the data matches what Spark wrote
    assert result.num_rows == 3, f"Expected 3 rows, got {result.num_rows}"

    # Sort by id for deterministic comparison
    result = result.sort_by("id")

    ids = result.column("id").to_pylist()
    data = result.column("data").to_pylist()
    values = result.column("value").to_pylist()

    assert ids == [1, 2, 3], f"Expected ids [1,2,3], got {ids}"
    assert data == ["alice", "bob", "charlie"], f"Expected data ['alice','bob','charlie'], got {data}"
    assert values == [1.0, 2.0, 3.0], f"Expected values [1.0,2.0,3.0], got {values}"

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant